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Introduction 


Certain  database  query  systems,  especially  those  that  interpret 
natural  language  [1,  2],  do  not  require  the  user  to  have  comprehensive 
knowledge  of  the  database  schema  in  order  to  form  a meaningful  query. 
Instead,  the  query  is  used  to  reference  entities  in  the  database 
schema  and  a path  is  automatically  found  through  the  schema  which 
connects  those  entities.  There  is  an  obvious  problem  of  choice  when 
more  than  one  such  path  exists.  In  this  paper,  we  will  classify  those 

schemata  in  which  all  paths  are  equivalent,  and  then  derive  a method 

/ 

of  marking  a schema  in  order  to  determine  the  "natural"  paths  between 
two  entities  when  different  paths  give  rise  to  different  results. 


Functional  Dependencies 


In  one  sense  or  another,  the  notion  of  functional  dependency  is 
central  to  nearly  every  model  of  database  semantics.  A functional 
dependency  describes  the  existence  of  a function  between  two  classes 
of  objects  in  the  database  which  obtains  even  though  those  classes  may 
vary  in  time.  The  importance  of  functional  dependencies  to  the 
relational  model  has  been  fully  treated  by  Codd  and  others  [3,  4).  As 
another  example,  the  set  ownership  construct  of  DBTG  [5]  structures 
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establishes,  when  the  MANDATORY  option  is  employed,  a functional 
dependency  of  the  owning  class  upon  the  owned  class. 

To  start  with,  we  shall  assume  a very  simple  model  of  a database 
schema:  an  acyclic  graph  of  functional  dependencies.  For  example 
figure  1 shows  a set  of  dependencies  ’which  derive  from 


Figure  1.  A set  of  functional  dependencies 
the  observations  that  each  CLASS  is  an  instance  of  a COURSE  and  HAS  an 
INSTRUCTOR.  Each  COURSE  is  offered  in  a DEPARTMENT  and  each 
INSTRUCTOR  belongs  to  a department.  Figure  2 shows  a set  of 

relational  definitions  which  might  be  used  to  describe  this  set  of 
dependencies 


CLASS (HOUR,  ROOM#,  COURSE#,  INSTRUCTOR#,  ...) 
INSTRUCTOR (INSTRUCTOR# , DEPT#,  ...) 

COURSE (COURSE# , DEPT#,  ...) 

DEPARTMENT  (DEPT#  ) 


Figure  2. 
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Strictly  speaking,  the  relational  model  does  not  contain  any 
notion  of  the  dependency  of  the  tuples  of  one  relation  upon  those  of 
another,  and  only  a par t ial  dependency  is  given  by  the  various  natural 
joins.  However  an  extension  of  the  model  by  Smith  and  Smith  [6] 
describes  how  functional  dependencies  may  be  used  in  the  design  of  a 
relational  database.  For  the  Bachman  diagram  (DBTG  schema)  that 
corresponds  to  figure  1,  one  has  simply  to  invert  it. 

. It  is  apparent  that  there  are  two  paths  from  CLASS  to  DEPT  in 
figure  1,  and  it  may  be  that  there  is  a constraint  on  the  database 
that  permits  an  instructor  to  teach  only  those  courses  which  are 
offered  in  his  department.  That  is,  having  specified  a member  of 
CLASS,  the  same  member  of  DEPT  is  reached  no  matter  which  path  is 
taken.  We  shall  use  the  term  natural  to  describe  a diagram  with  this 
property.  In  order  to  demonstrate  that  natural  diagrams  do  ,in  fact, 
arise  naturally,  consider  the  schema  in  figure  3.  In  order  to  enroll 
in  a course,  a class  must  be  specified.  Moreover,  the  direct 
dependency  of  course  upon  enrollment  is  usually  necessary  to  prevent  a 
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Strictly  speaking,  the  relational  model  does  not  contain  any 
notion  of  the  dependency  of  the  tuples  of  one  relation  upon  those  of 
another,  and  only  a partial  dependency  is  given  by  the  various  natural 
joins.  However  an  extension  of  the  model  by  Smith  and  Smith  [6] 
describes  how  functional  dependencies  may  be  used  in  the  design  of  a 
relational  database.  For  the  Bachman  diagram  (DBTG  schema)  that 
corresponds  to  figure  1,  one  has  simply  to  invert  it. 

. It  is  apparent  that  there  are  two  paths  from  CLASS  to  DEPT  in 
figure  1,  and  it  may  be  that  there  is  a constraint  on  the  database 
that  permits  an  instructor  to  teach  only  those  courses  which  are 
offered  in  his  department.  That  is,  having  specified  a member  of 
CLASS,  the  same  member  of  DEPT  is  reached  no  matter  which  path  is 
taken.  We  shall  use  the  term  natur al  to  describe  a diagram  with  this 
property.  In  order  to  demonstrate  that  natural  diagrams  do  ,in  fact, 
arise  naturally,  consider  the  schema  in  figure  3.  In  order  to  enroll 
in  a course,  a class  must  be  specified.  Moreover,  the  direct 
dependency  of  course  upon  enrollment  is  usually  necessary  to  prevent  a 
student  enrolling  in  two  sections  of  the  same  course.  Je 
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Figure  3.  A "natural"  schema. 


An  extended  notion  of  functional  dependency  has  been  studied  by 
Armstrong  (7]  who  gives  axioms  for  what  is  termed  a "full"  set  of 
functional  dependencies.  The  relationship  between  functional 
dependencies  satisfying  these  axioms  and  a normalized  set  of  relations 
has  been  described  by  Beeri  and  by  Risannen  {8,9].  However,  in  the 
terms  described  above,  these  axioms  require  that  the  whole  schema  be 
natural  and,  with  Date  [10],  we  would  argue  that  this  requirement  is 
for  practical  purposes  too  restrictive.  Instead,  we  shall  examine 
natural  subschemata  in  a schema  which  is  not  itself  natural.  The 
reason  for  doing  this  is  both  as  an  aid  in  describing  "natural"  paths 
in  the  schema  and  also  to  investigate  the  problem  of  enforcing  the 
integrity  constraints  imposed  by  natural  subschemata.  The  formal 
results  which  follow  are  all,  with  minor  modifications,  applicable  to 
schemata  of  partial  functional  dependencies.  The  only  problem  with 
discussing  partial  dependencies  is  that,  even  in  a natural  schema,  not 
all  downward  paths  are  equivalent  for,  starting  at  a given  object, 
some  paths  may  not  lead  to  a defined  object. 


Natural  Schemata 


t We  shall  define  a schema  to  be  a collection  of  classes  V,  and 

functions  F such  that  any  f in  F is  a function  v^.  ->  vj  for  v-  , v.  in 
V.  With  each  schema  there  is  an  underlying  directed  graph  whose  arcs 
are  the  pairs  (v. , v> ) such  that  f:v-  ->  v-  is  in  F.  We  shall  impose 
the  additional  restriction  that  the  schema  is  acyclic,  that  is,  its 
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graph  contains  no  directed  cycles.  A subschema  is  defined  as  a 

I 

subgraph:  (V*,  F* ) is  a subschema  of  (V,  F)  if  v'c  V and  F'C  F.  The 

terms  "class"  and  "function"  for  a schema  will  be  used  inter chageably 

with  the  terms  "point"  and  "arc"  for  a directed  graph. 

Under  what  conditions  is  the  union  of  two  natural  subschemata 
itself  natural?  We  shall  show  that  under  certain  conditions  we  need 
no  further  knowledge  of  the  semantics  of  the  database:  this  property 
may  be  inferred  directly  from  graphical  properties  of  the  subschemata. 
Formally,  a schema  is  natural  if  for  any  composition  of  functions 

f.  u _>  v and  •<32.---9v  u~>v  ' 

f.  •fa.---f^x)  = 9,  -gx...gvl(x) 

for  all  x c-  u.  Let  p be  a point  of  a graph  G and  G*  a subgraph  of  G. 

Define  domfpjG^G)  to  be  the  set  of  points  utG7  such  that  there  is  a 

directed  path  in  G from  u to  Similarly  let  subfp^'.G)  be  the  set 

of  points  v in  G for  which  there  is  a directed  path  from  p to  v in  G. 

We  shall  say  a set  of  points  in  G is  connected  if  the  induced  subgraph 

on  those  points  is  connected. 

! I 

If 

Prop.  1.  Let  S,  and  be  natural  subschemata  of  S.  If 

dom(p,S,n  Sl(S;)n  sub(p,S,n  SX,S;)  is  connected  in  SOS.  for  all 
points  pe  S;  and  q t 3j  (i  4 j),  then  the  union  S,U  sx  is  natural. 

" Proof.  Without  loss  of  generality,  assume  that  ut  S,  , v c-  S,  and  that 

there  are  two  directed  paths  from  u to  v.  These  paths  must  both  meet 
S.hs,.  in  p,  and  px  which  are  in  H=dom (v  , S,0  a*  , Sx ) H sub  (u  ,S,n  Sx  ,S , ) , 
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and  this  subset  is  connected  in  S,riSl  by  the  conditions  of  the  prop. 
2.  Let  q,  .qx..qK  be  an  (undirected)  path  in  H which  connects  p,  and 
pz.  Now  make  the  inductive  assumption  that  the  subschema  consisting 
of  the  path  PQ  from  u to  v via  p,  and  any  path  P{  from  u to  v via  q; 
f is  natural.  Let  P;r(  be  any  path  from  u to  v via  q;  and  assume, 

again  w.l.o.g.  that  the  direction  of  the  arc  between  q^  and  qi4  , is 
given  by  (q(  , q-tl  ) . 


t \ A. 
\ \ 


■.sj- 


..  •••r 


r u q\  + i 
->2  1/ 


Figure  4 


Letting  denote  the  portion  of  P^  from  u to  q;  and  q'  the  portion 
from  q to  v and  using  the  notation  F(P)  to  denote  the  composition  of 
functions  associated  with  a directed  path  P,  we  have: 


F(P ;)  - P(Q;).F(Q[)  = F(Q-)  ,F(  (q-  ,q^(  )).F(Ql+1  ) 

because  S,  is  natural,  and  since  is  natural, 

t P(Q;),F((q.  ,qt>(  ))  .F(Q-t|  ) - F(Q-M  ) .F(q/w  )-F(p/rl  ) 


Page  7 


i 

The  Multiple  Path  Problem  in  Database  Schemata 

Thus,  by  induction  the  paths  from  u to  v via  pt  and  pz  (refer  to 
^ figure  4)  form  a natural  subschema,  and  the  same  is  true  for  any  pair 
of  paths  in  Stu  Sz.  Although  it  is  not  hard  to  check  that  the 
conditions  of  prop.  1 are  met,  there  are  simple  corollaries  which  may 

9 

be  used,  in  some  circumstances,  to  make  the  check  even  simpler.  In 
this  context,  we  shall  use  the  term  tree  to  refer  to  a directed  graph 
which  is  a tree  (see  Harary  [11])  when  the  direction  of  the  arcs  is 
^ removed. 

f 

Co  r . .1  If  S^.  and  Sz  are  natural  subschemata  such  that  S and  3,  D Sx  is  r 

a tree,  and  such  that  dom  ( p-  , S,  0 ) and  sub  ( p^  , S,r\  , s ^ ) are  trees 

for  (i  = l,2),  then  S,  u is  natur  al . // The  proof  of  this  follows 

® immediately  from  the  fact  that  the  intersection  of  two  sub-trees  in  a 
tree  is  connected  and  thus  the  conditions  of  Prop  1.  are  met.  This 
corollary  in  turn  may  be  used  to  provide  the  simpler  result: 


Cor  2 . If  the  intersection  of  two  natural  subschemata  is  a directed 
• path,  then  the  their  union  is  also  natural. 

For  example,  in  figure  5,  if  {1,2,3}  is  and  {1,3,4}  are  natural 
subschemata,  then  so  is  {1,2, 3, 4}  by  cor.  2.  If  in  addition  we  know 

e 

that  {2, 3, 4, 5}  is  natural,  cor.  1 ensures  that  the  whole  schema  is 
natural . 


O 


» Figure  5 


The  ability  to  recognize  natural  subschemata  is  useful  in  querying  the 
database,  but  their  presence  imposes  additional  constraints  on  the 
database  which  must  be  taken  into  account  when  performing  updates. 
For  example,  consider  the  problem  of  adding  a new  member  x to  a class 
v in  a natural  schema  S.  It  is  necessary  to  specify  the  values  of 
f ( x ) for  any  f:  v ->  u in  S.  In  addition  it  is  necessary  to  check 
that  the  addition  of  this  member  does  not  violate  the  naturalness  of 
S.  Fortunately,  in  many  cases  it  is  possible  to  limit  the  checking  to 

^ a natural  subschema  of  S.  Let  G be- directed  and  acyclic  and  p be  a 
root  for  G;  that  is  there  is  a directed  path  from  p to  each  point  in 
G.  We  shall  refer  to  a point  q of  G as  essential  if  there  is  no 
connected  subgraph  whose  removal  disconnects  p from  q.  BY  the 
essential  subschema  for  p,  we  mean  the  set  of  all  points  which  are 
essential  w.r.t.  p or  lie  on  a directed  path  from  p to  such  a point. 

» 
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T+vem*rrnr  2.  Let  p be  the  root  of  a subschema  S,  and  let  S'  be  its 
essential  subschema.  Then  S and  S'  satisfy  the  conditions  of  prop.  1. 

The  proof  of  this  result  is  straightforward,  and  its  its 
♦ usefulness  is  that  we  may  limit  consistency  checks  to  the  essential 
subschema  of  a class  v when  adding  a new  member  to  v.  For  example, 
when  adding  a member  to  class  1 in  figure  5,  it  is  only  necessary  to 
f check  that  this  update  preserves  naturalness  of  subschemata  {1,2,3} 
and  {1,3,4}.  Modifications  to  f:  u ->  v should  be  checked  for 
consistency  in  the  essential  subschemata  for  u,  but  note  that  such 
) modifications  are  severely  limited.  For  example,  in  if  figure  1 is 
interpreted  as  a natural  schema,  modifications  to  the  dependency 
CLASS  ->  INSTRUCTOR  may  only  take  place  within  the  inverse  image  of 
some  member  of  DEPARTMENT  and  that  modifications  to 
INSTRUCTOR  ->  DEPARTMENT  are  impossible  when  the  inverse  image  of  a 

■ 

member  of  INSTRUCTOR  is  non-empty.  Natural  schemata  impose  no 
restrictions  on  deletions,  but  as  Smith  and  Smith  point  out  15],  the 
removal  of  a member  of  u may  require  the  removal  of  a substantial 
number  of  records  in  dom(u,S,S). 

I 


i 


Descr ibing  Na  tural  Paths 

We  now  turn  to  the  problem  of  describing  natural  paths  through  a 
schema  which  is  not  itself  natural.  If  we  now  interpret  figure  1,  for 
example,  as  not  being  a natural  schema,  then  the  choice  of  which  path 
to  take  between  CLASS  and  DEPARTMENT  is  important:  presumably  the 
correct  path  is  via  COURSE.  In  this  case,  all  paths  except  one  are 
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natural.  How  are  we  to  describe  the  natural  path  between  two  classes 


in  a schema? 

One 

way 

is  simply 

to  keep 

a description  of 

the 

appr opr iate 

path 

for 

each  pair 

of  classes 

in  the  schema. 

This 

solution,  apart 

from 

being  cumbersome  to 

implement,  places 

a 

considerable  extra  burden  upon  the  database  designer  who  must  describe 
these  paths.  The  solution  proposed  here  is  to  mark  those  parts  of  the 
schema  which  are,  in  some  sense,  "unnatural".  First  we  should  note 
that  we  have  not  yet  defined  the  term  na t ur a 1 for  a path.  The  natural 
paths  from  u to  v cannot  be  intrinsically  defined;  they  are  simply 
the  paths  of  choice  in  a query  that  given  a member  of  u,  requests  a 
member  of  v.  The  only  restriction  that  can  be  placed  on  natural  paths 
is : 

Condition  1.  The  union  of  all  natural  paths  from  u to  v form  a 
natural  subschemata. 


It  is  suggested  that  the  characterization  of  natural  paths  may  be 
performed  as  follows.  Certain  functions  (arcs)  in  the  schema  are 
labelled  with  classes  (points).  If  f;  v ->  w is  labelled  with  u,  this 
is  interpreted  as  meaning  that  no  natural  paths  from  u may  pass  along 

\ VI  O bin.ja.-  c*.  /»c  l\ ) 

the  arc  f.  For  example,  in  figure  1,  the  arc  from  INSTRUCTOR  to  DEPT 
may  be  marked  with  CLASS.  This  means  that  there  is  no  natural  path 
from  CLASS  to  DEPT  through  INSTRUCTOR,  however  the  elementary  path 
from  INSTRUCTOR  to  DEPT  is  itself  natural.  Note  that  only  arcs  below 
a given  point  in  the  schema  are  marked  with  u and  that  if  the  schema 
is  marked  with  u wherever  it  is  appropriate  to  do  so,  by  condition  1 
the  union  of  all  paths  from  u which  do  not  traverse  an  arc  labelled 
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with  u,  form  a natural  subschema, 


Figure  6 shows  a more  complicated  example  of  this  kind  of 
marking.  A second  condition  arises  from  the  desire  to  minimize  the 
number  of  labels  required  to  describe  the  unnatural  paths. 


Condition  2.  If  P is  a path  from  v to  w which  is  not  natural,  then 
any  path  from  u to  w with  P as  a final  segment  is  also  unnatural. 


Thus,  in  figure  6,  a path  from  ENROLLMENT  via  CLASS  to  DEPARTMENT 
which  includes  the  arc  from  FACULTY  to  DEPARTMENT,  cannot  be 
considered  natural  because  this  arc  is  marked  with  CLASS. 


(tNlU'LL  rt  L 


l s_(,  0£w" 


Vi  A 5< 


\ C\  Nsrrtvc 

(VcOK5{^  ^ 


1 1-  ( ( 

(ffTcOtrvJ 


w 


Qhf  HirnV£. 


'fL  K Sq  >sl 


Figure  6 


This  marking  can  be  readily  used  in  any  path  finding  algorithm 
which  attempts  to  connect  two  points  in  the  schema.  If  u , u, , ut . . . u^ 
is  being  considered  as  an  initial  segment  of  a path  from  u to  v and 
there  is  an  arc  from  u-  to  u^,.,  then  the  path  u,  u, , u„ , u**,  may 


I 


The  Multiple  Path  Problem  in  Database  Schemata 


Page  12 


only  be  considered  if  the  arc  (un,u„T))  is  not  labelled  with  u or  u- 
(l<i<n) . This  characterization  of  a natural  path  means  that  there  may 
be  a natural  subschema  S which  contains  unnatural  paths.  For  example 
the  subschema  on  {STUDENT,  ADVISER,  FACULTY,  DEPARTMENT}  is  natural, 
and  trivially,  prop.  1 allows  us  to  adol  enrollment  to  this  and 
preserve  the  naturalness  of  the  subschema.  But  this  subschema 
contains  unnatural  paths. 

In  concluding  this  section,  it  should  be  remarked  that  we  have  so 
far  employed  a particularly  simple  model  of  a database  schema  and  that 
enrichment  of  this  model  is  likely  to  add  more  techniques  for  the 
construction  of  natural  paths.  A particular  case  in  point  is  that  of 
keys . In  terms  of  functional  dependencies,  suppose  that  u is  a class 
in  S and  f,  , f ,...f  are  (total)  functions  which  respectively  map  u 

I Z 

into  v,  , v , ...v,^.  We  shall  say  that  { f ( , f^  , . . . f^}  is  a key  for  u if 
there  exists  a partial  function 

F:  v x v^x. . .x  v^  ->  u 

such  that 

1.  F(f,  (x),  f Jx)  , . . .f^x) ) = x for  all  x in  u. 

2.  There  is  no  subset  of  {f  , f z , . . . f^}  with  this  property. 

In  figure  6,  keys  have  been  marked  with  a solid  triangle.  A key 

function,  one  which  is  a member  of  a key,  defines  a close  relation 
between  classes  in  the  schema.  In  particular,  key  functions  appear  to 
be  good  candidates  for  inclusion  in  natural  paths.  WHile  it  is  quite 
easy  to  construct  examples  in  which  there  is  a path  through  key 


diagram  does  not  tell  us  whether  a student  can  also  be  a faculty 
member,  in  other  words  we  have  no  way  of  knowing  whether  the  inverse 
images  of  STUDENT  and  FACULTY  in  PERSON  have  a non-empty  intersection. 
Smith  and  Smith(12]  have  also  described  the  concept  of  generalization 


in  which  data  values  may  be  used  in  cases  where  the  intersection  is 
t> 

empty,  to ^ which  inverse  image  an  object  belongs.  Such  knowledge  is 
clearly  of  great  value  in  determining  a natural  path,  but  now 
introduces  the  complication  that  the  natural  path  required  by  a query 
may  depend  on  the  data  associated  with  that  query  and  may  not  be  found 
f by  inspection  of  the  schema  alone. 


A second  problem  lies  in  the  treatment  of  paths  which  go  "up  and 
down"  the  schema.  In  this  paper  only  directed  paths  have  been 
treated,  however,  database  queries  may  involve  traversing  the  schema 
in  more  than  one  direction.  For  example,  a path  between  INSTRUCTOR 
and  COURSE  could  be  asked  for  from  figure  1.  in  this  case,  the  path 
of  choice  would  require  going  up  to  CLASS  and  down  to  COURSE.  While 


«■  * 


I 


I 
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the  techniques  presented  here  may  greatly  reduce  the  number  of  choices 
| for  such  paths,  even  in  a natural  subschema,  different  undirected 
paths  will  produce  different  results.  Some  additional  constructs  are 
needed  to  resolve  the  choice  in  these  circumstances. 

» 
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